Computation Error Analysis of Block Floating Point Arithmetic Oriented Convolution Neural Network Accelerator Design

نویسندگان

  • Zhourui Song
  • Zhenyu Liu
  • Chunlu Wang
  • Dongsheng Wang
چکیده

The heavy burdens of computation and off-chip traffic impede deploying the large scale convolution neural network on embedded platforms. As CNN is attributed to the strong endurance to computation errors, employing block floating point (BFP) arithmetics in CNN accelerators could save the hardware cost and data traffics efficiently, while maintaining the classification accuracy. In this paper, we verify the effects of word width definitions in BFP to the CNN performance without retraining. Several typical CNN models, including VGG16, ResNet-18, ResNet-50 and GoogLeNet, were tested in this paper. Experiments revealed that 8-bit mantissa, including sign bit, in BFP representation merely induced less than 0.3% accuracy loss. In addition, we investigate the computational errors in theory and develop the noise-to-signal ratio (NSR) upper bound, which provides the promising guidance for BFP based CNN engine design.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving accuracy of Winograd convolution for DNNs

Modern deep neural networks (DNNs) spend a large amount of their execution time computing convolutions. Winograd's minimal algorithm for small convolutions can greatly reduce the number of arithmetic operations. However, a large reduction in floating point (FP) operations in these algorithms can result in significantly reduced FP accuracy of the result. In this paper we propose several methods ...

متن کامل

Hardware-oriented Approximation of Convolutional Neural Networks

High computational complexity hinders the widespread usage of Convolutional Neural Networks (CNNs), especially in mobile devices. Hardware accelerators are arguably the most promising approach for reducing both execution time and power consumption. One of the most important steps in accelerator development is hardware-oriented model approximation. In this paper we present Ristretto, a model app...

متن کامل

Block Floating Point Interval ALU for Digital Signal Processing

Numerical analysis on real numbers is performed using either pointwise arithmetic or interval arithmetic. Today, interval analysis is a mature discipline and finds use not only in error analysis but also control applications such as robotics. The interval arithmetic hardware architecture proposed in the work [W. Edmonson, R. Gupte, J. Gianchandani, S. Ocloo, W. Alexander, Interval arithmetic lo...

متن کامل

An FPGA-Based Face Detector Using Neural Network and a Scalable Floating Point Unit

The study implemented an FPGA-based face detector using Neural Networks and a scalable Floating Point arithmetic Unit (FPU). The FPU provides dynamic range and reduces the bit of the arithmetic unit more than fixed point method does. These features led to reduction in the memory so that it is efficient for neural networks system with large size data bits. The arithmetic unit occupies 39~45% of ...

متن کامل

Flexible On-chip Memory Architecture for DCNN Accelerators

Recent studies show that as the depth of Convolution Neural Networks (CNNs) increases for higher performance in different machine learning tasks, a major bottleneck in the improvement of deep CNNs (DCNNs) processing is the traffic between the accelerator and off-chip memory. However, current state-of-the-art accelerators cannot effectively reduce off-chip feature map traffic due to the limited ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.07776  شماره 

صفحات  -

تاریخ انتشار 2017